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Section 1. Introduction, Definitions and Examples. 

There are many situations arising in analysis, physics and other areas where 
Solutions to certain problems are naturally expressed in terms of function space 
integrals or expectations of certain functionals with respect to specifio stochastic 
processes. This representation of the solution can be used for many purposes. 
First of all it can be used to prove the existence of the solution, and it can also 
be used to establish some of the qualitative properties of the solution. One may 
also be able to perform some Monte Carlo simulations to evaluate the integral. 

However, we will take a different point of view in these lectures. Often there 
are one or more parameters in the problem and we have a situation where the 
functional to be integrated as well as the probability measure to be used in the 
integration may depend on these parameters. Asymptotic evaluation of these 
integrals when the parameter becomes large or small is rather useful. Whereas in a 
single integral contributions come from the entire range of integration it is quite 
conceivable that as the parameters approach their extreme values the integration 
process becomes singular in the sense that the major contribution to the integral 
comes from a set whose measure is becoming extremely small. The principle of large 
deviation is the art of determining how small the probabilities of these rare events 
really are. It is then used to identify where the major contribution to the 
integral comes from and leads to a precise estimation of the integral itself. 

Let us examine this by means of a simple example. Let X4,X2,***, Xp, et? be a 
sequence of independent positive random variable with a common distribution a. We 
will assume for simplicity that 1 < κι € 2 P probability 1. Let ξῃ = X4X5***Xj. 
Then log ἔῃ = F log x; and lim : log ἔῃ = van 3 da(x), i.e. with probability 
nearly one we Pere En ο where log a = f log x da(x). On the other hand 
Εξῃ = (Ex,)? = (/ x da(x))P. By Jensen's κ. Ex, > exp(Elogxi) and if a is 


1 
nondegenerate the inequality is strict. Therefore the contribution to EE, from 


typical sequences which grow like a! does not account for the growth in EEn- Where 
does the contribution come from? We might try to analyze what the probability is for 
En to grow like AP. Of course unless 1 < à < 2 this probability is zero. Unless 
À 5 exp(Elogx, ) this probability goes to zero. It turns out that the probability 
goes to zero exponentially rapidly, i.e. like [ρίλ) 1} where p(A) can be explicitly 
determined as a function of À. The quantity (A) is one if λ = exp(Elogx,), 
O < p(X) < 1 for all values of A and p(A) = 0 unless O < A < 1. The contribution to 
the integral E&, from those ἔῃ which are like A" is then λΠΓρ(λ) J". The maximum 


contribution comes from that value of Ag such that 


Age (Ag) = Sup Ap (A) 
O«A«1 


and EX, = Agp(Ag). The principle of large deviation tells us which values of ἔῃ 
contribute to E£n as n> œ, 

A more complicated example is provided by considering a Markov chain on a 
finite state space X. For x,y e X let (x,y) be the probability of transition from 
state x to state y in a single step. We will assume for simplicity that πίχ,γ) > 0 
for all x,y c X. Under this condition we have a unique invariant set of 
probabilities a(x), x e X and we have the usual set of ergodic theorems on the long 
term behavior of irreducible finite state space Markov Chains. Let us take a 
function V: X ^ R and consider Eyexp[V(X4) + **V(X4)] = Jy(x) where x is the 
Starting point or Χρ and X4, t **Xà are the states of the Markov Chain at times 1 
through n. If we take a frequency count fa?) of the number of visits to the state 


y by the string X,,***,X4 then 


Ja(x) = ΕΧΘΧΡΙΣ V(y)fu G2] 
y 


= Eyexpin Σ V(y) 


y 
fa) 
x is very close to a(y) for all states y with a very high 


Γη) 


1. 


By the ergodic theorem 


probability. If Bly); y eX is any other probability vector on the state space 


Γη(Υ) 
PxC n ~ B(y) for all y e X] is likely to be very small if β κα. In fact the 


probability is exponentially small with a rate exp[-nI(8)] which is independent of 


the starting point to within the exponential constant. The contribution to I,(x) 


from the strings Xi,***,XQ whose relative proportion of visits to the states is 


Close to B(*) is exp[nzV(y)8(y) - n1(8)] so that one expects 


(1.1) limllog JgGO = sup LIVGOBG - 1(8)) 
no 8 


where the supremum is taken over all probability vectors β on X. (a) of course is 
zero and I(8) > 0 for B 4 a. We can of course evaluate J,(x) = «8, , Cre )e» where 
eV is tne diagonal matrix with exp{V(x)} on the diagonal place corresponding x. 
Here e is the vector of units and 6, is the vector with 1 at x and 0 everywhere 
else. Frobenius theory of positive matrices evaluates 


(1.9) lim l log ὅπί(κ) = log A(V) 


ne 


Where A(V) is the spectral radius of the positive matrix (GG y)expLVCy) 23 yex- So 


we obtain a relation 


(1.3) log A(V) = sup [σγ(γ)β(γ) - τ(β)] . 
β 


It turns out that I(8) and log A(V) are convex functions of β and V respectively and 


we have the dual relationship 


(1.4) I(8) = sup [zV(y)8(y) - log λίψ)} . 
V 


Technically one proves the exponential rates of decay for sets of interest by 
variants of the above formula. 

We will conclude this lecture by providing a formal definition of what we mean 
by the principle of large deviation. The definition is formulated in terms of 
exponential rates of decay. 

Let X be a complete separable metric space and let P, be a sequence of 
probability measures on the Borel o-field of X. A function I(x) from X +R is 
called a rate function if 

(i) ος I(x) « œ 

(ii) I(x) is lower semi-continuous on X, and 

(iii) The level sets Ας = {x:I(x) < 2} are compact sets in X. 
The sequence {Py} is said to obey a principle of large deviation with rate function 
I(+) if 


a) For every closed set C C X 


limsup i log PQ(C) < - inf I(x) 
n»o XeC 


and 


b) For every open set GC X 


liminf L iog P4(G) > -inf I(x) . 
neo n xeG 


It follows then that if A is a Borel set with 


inf I(x) = inf I(x) 


χεΑῦ ΧΕΑ 
then 


lim L log PQ(A) = ~inf I(x) 
nee n xeA 


Here ae and A are respectively the interior and closure of A. 

Sometimes one runs into a situation where the rate function satisfies only (i) 
and (ii) and only a weak form of the large deviation principle with 

a) replaced by 


a') For every compact set K C X 


limsup 1 log PQ(K) < -inf I(x) 
n>% n xeK 


holds. 
b) Stays the same. In such situations it is difficult to aggregate the local 
estimates provided by the rate function I(+) into a global estimate. Of course if 


the space X is compact then there is no difference between the two. 


Section 2. Basic General Facts. 

In this lecture we will establish some of the basic implications that follow 
from the principle of large deviations. 

Let δη satisfy the principle of large devitions on a complete separable metric 
space with a rate function I(*). Let F(*) be a bounded continuous function on X. 


Then 


Theorem 2.1 


lim I log [οκρίηΕ (κ) Jab GO = sup (F(x) - I(x)] . 
x 


nro 


Proof: Given any ὃ > 0 we can find a finite number C4,Co5,***,Cy 
of closed sets such that the oscillation of F(x) on Cj is less than 6 


for every j and such that C4,***,Cy cover X. Then 


N 
| exp[nF(x)]dP, < ὃ | exp[nF(x) ]4Ρῃ 
4-1 


X Cy 
N 
< } exp[n sup F(x) ]J-Pq(C3) 
jel χες, 
J 
N 
< à exp[n inf F(x) + né]+Py(C3) 
4151 xec; 
Therefore 
limsup 1 log | exp[n Ε(χ)]άΡῃ 
no n X 
< sup [inf F(x) + ὃ - inf I(x)] 
1«1«Ν xet; xet) 
< sup [sup (F(x) - I(x)} + 6} 
1SJSN χεσὶ 
= sup [F(x) - I(x)] +6. 
xeX 


Since 6 > 0 is arbitrary we have 


(2.1) Limsup I log | exp[n F(x) ]dP, < sup (F(x) 4 I(x)] 
X xeX 


On the other hand if x eX is arbitrary and U is a neighborhood of x with 


ne 


F(y) > F(x) - e for y e U then 
| exp[n F(y)]dP, > | exp[n Ε(γ)]4Ρῃ 
χ υ 
> exp[n (F(x) - ε)Ἱρρ(ύ) . 


Therefore 


liming i | expin F()JdP, 
nee n χ 


Iv 


F(x) - e - inf I(y) 
yeU 


F(x) - I(x) -e. 


Iv 


Since x e X and e > 0 are arbitrary we obtain 


(2.2) imine L iog | expIn FG) Ἱαρη(Υ) > sup [P(x) - 1(x)] 
now X x 


(2.1) and (2.2) establish the theorem. 
Sometimes one has to use a more complex version of theorem 2.1. We will state 


and prove the part of this version relevant for upper bounds. 


Theorem 2.2. Let Εῃ(Χ) be a sequence of nonnegative functions such that for some 


lower semicontinuous non negative function F(x) 


liminf Ερίκ) > F(x) 


Reo 


for every sequence Xn 7 X, and every x e X. Then 


limsup 1 log | exp[-n Εῃη(χ) ]4Ρῃ 


nro χ 
<- inf (F(x) + I(x)]. 
xeX 
Proof: Let £ = inf[F(x) + I(x)]. For any 6 > 0 and x e X there is a neighborhood 
xeX 


Us x of x such that 


inf  I(y) > I(x) - 6 
yeUs x 
and 


liminf inf Fly) > F(x) - 6 
noo yells x 


Therefore as n > œ 


| exp[-n Εῃ(Υ) Ἱάρῃ < Ρῃ(ῦς κ)εχρ[-π(Ε(κ) - 6) + o(n)] . 
Us,x 
Since Us , is a closed set, 
1 
limsup = 198 | exp[-nF,,(y) ]4Ρῃ 
no 

Us x 
< -[F(x) - ὁ] - [1ι(χ) - ὁ] 


l^ 


-inf [F(x) + 1(x)1 + 26. 
yes x 
If K is any compact set in X, then a finite union of Us x as x varies will cover K. 


Let us call this finite union Ug. We have 


1 
limsup a log | exp[-nF, (y) ]4Ρῃ 


noo U 
6 
< -inf (F(x) + I(x)] + 26 
yeus 
«τε» 28. 


Let us pick K = (x: I(x) < k} where k >> £. Then 


| 
i uM - 
1 aoe π΄ 198 exp[-n Fy(y) JP, 
X-Us 


lA 


1 
limsup = log ΡΠΙΧ - Us] 
neo 
-inf I(x) < ck. 
xeUs 


lA 


Therefore 


1 
limsup g 198 | exp[-n Σῃ(Υ) Ἱάρῃ 
nee χ 


< Max[-& + 26, -κ] . 
Since k < » is arbitrary and ὃ > 0 is arbitrary we let k + œ and ὃ + 0 to obtain our 
theorem, 

A situation that comes up often in applications is the following: P, is a 
sequence of probability measures on X satisfying a large deivation principle with a 
rate function I(+). We have a continuous map F: X> Y into another complete 
separable metric space. We denote by Qn = ΡΕ |, the image of P, on Y udner F. 
One can ask if Q, satisfies a large deviation principle and if so what is the 


relation of its rate function to I(*). 


Theorem 2.3. Q, satisfies a large deviation principle with a rate function I'(y) 


given by 


I'(y)* inf I(x). 
x: F(x) ay 


Proof: It follows from the properties of the rate function I(*) that I'(+) is a rate 


function on Y. Moreover for any closed set C C Y 


limsup = log Q,(C) = limsup 1 log Pa (71C) 


neo nto 
< -inf I(x) 
xeF ! (C) 
= -inf I'(y). 
yec 


The lower bound for open sets is similar. We will refer to this theorem as the 
contraction principle. 

The theorems in the theory of large deviations are fairly stable under 
reasonable perturbations; for instance if we assume that P, satisfies a large 
deviation principle with rate Τί») and F, are continuous maps from X + Y converging 


1 


uniformly on compact sets of X to F then for the image Q, = Pn?Fn we have again a 


theorem. 


Theorem 2.4, Qn =P °F, satisfies a large deivation principle with the rate 


function 


Itty) = inf I(x). 
xiF(x)-y 


10 


Proof: Let A C Y be a closed. Let Cp = (x:FQ(x) ε A} then 
Py (Cy) = Qn CA) . 


If we let C = {x:F(x) e A}, from the uniform convergence of F, to F on compact sets 
it follows that given any open set U containing C and any compact K C X there is a 


neighorhood K5 of K such that 

Cn f Κδ CU for sufficiently large n. 
Therefore for n large enough 

Ρῃ(Όῃ) € PQUD. + PC - KS) 


P (U) + P (X - KD) 


lA 


If we take K = {x:I(x) < 4] we obtain 


limsup È 106 Py(Cp) € Max[- inf I(x) κε]. 


nro 


-- χει 
We let 4 » ο and U + C so that 


lim inf I(x) = inf I(x) 


U«C χεῦ χες 
and we obtain our desired result. 


Let us now take GC Y to be open. Let y e G be arbitrary and x e X be such 
that F(x) = y. Since F,(x) tends to F(x) uniformly on compact sets we can find a 
neighborhood V of x in X such that F,(V) C G for n large enough. Therefore for 


sufficiently large n 


Qn (G) > PgOD 
and 


1 1 
liminf = log Q,(G) > liminf log Py(V) 


New new 


-I(x) . 


lv 


Since this is true for every χε X such that y = F(x) e G we have our theorem. 


Section 3. Large deviations for stationary stochastic processes. 
Our lectures deal mainly with large deviations of various ergodic phenomena. 
Let us take a sequence X1,Xo,°**Xy,°** of real valued random variables which form 


an ergodic stationary sequence in the strict sense. We can extend the process to 
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nonpositive integers and obtain a stationary process {Xj} -oœ<j <w., The measure 
P corresponding to such a process is a translation invariant ergodic measure on the 


doubly infinite sequences of real numbers. For every n we have the random variable 


z X4****Xà 


En and this will have a distribution Q, under P. The ergodic theorem 
asserts that if EP |x; | <œ then Q, for large n is close to the degernate 
distribution at a = EPx, € R. Hopefully under suitable susmptions on the underlying 
process P, Q, will satisfy a large deviation principle on R with a rate function 
h(x), x € R. Since the entire probability is getting concentrated at x =a we 
expect h(a) = 0 and h(x) > 0 for x # a. 

We can consider a more general situation in which the stationary process 05) 
takes values in an arbitrary Polish space X. We can take for our p the random 


fQü)* eef (Xp) 
varible Εῃ(2) = —__________ where f is a bounded continuous function on X. Again 


n 
we can expect to strengthen the ergodic theorem by establishing a large deviation 
principle for the distributions of of EQ(f) under P on R with a rate function he(y), 
y € R. We cn be more ambitious and consider a vector f4,-*:,fg of functions on X 
and expect for the random vector EQ(f) or for its distribution af on RÓ under P a 


large deviation principle with a rate function he(y), ye μά, 


In fact one should be even more ambitious and consider the random variable 
En = LONE 
n 
as a map from the space 9 of infinite sequences into the space of probability 
measure on X. En is then the empirical distribution of the process based on the 
first n random variables. The distribution of & is then a measure Qp on the spce M 
of probability measures on X. The ergodic theorem is still valid and asserts that 
Eg converges almost surely (in the sense of weak convergence in the space M) with 
respect to P to the measure a which is the one dimensional distribution of the 
random variable X, under P. We may again expect for Q, a principle of large 
deviation to hold in the space M with a rate function Ι(μ), p € M. Needless to say, 


we expect Ila) = O and I(4) > O for μ κα. Iff is a bounded continuous vector 


valued function on X (with values in RÀ) then 


fO) *or FS 


= sofe diyo 


Therefore the map y > ffdu from M> rd maps Qy onto df. By the contraction 


12 


principle one may deduce the large deviation principle for of from that of Qn. The 


relation between the rate functions is provided by Theorem 2.2. Therefore 


hpe(y) = inf Ilu) 
u:ffdyusy 
We have so far looked at the ergodic theorems for random variables of the form 


f(X4) ELIT ΓίΧῃ) 


nD -------------: 


But the ergodic theorems apply equally well for random variables of the form 


gÜQ,X2) + *** Ε(Χρ,Χῃιι) 
(3.1) ey 6 Co d 


n 


Continuing on in the same spirit we might want to look at the map 


6 ore δ 
(2) X1sXp Χρ» Χρ 
tt - 


Ώ 
from 9 into Μί2) the space of measures on XxX or x2), We may expect again a large 
deviation principle with a rate function 12) (B) for Be M, There is nothing 


Special about 2 and we may take 


d o Elotte * Xa eek 1777 τ 98 eX 
Bn’? 


n 
In fact we should abandon all restraint and consider geo for the totality of all 


possible values of k. This has to be done with a little care. For every sequence 
wa (eee Xi eXQeX pee) ER 


and every n let us consider the sequence 


wi) æ (ee Keser ee eXn eX ze tee XysX pet Χρ") 


Formally the it” coordinate of uo) is given by 


afn) ax, if tsign 


η" ω(η) for alli andn, 


In other words we keep the chunk of w from ΧΙ, through x, and make it periodic 


outside of period n. If we look at all the periodic sequences of period n in R and 
denote this set by a") then the map w > wn) defines a map Tp from Q > oln). Given 


any point oU? in απ), denoting by T the shift in ο, wi, To), κε. Tlu) isa 
b(n) +6p, (n) tees Spn-1,,(n) 
periodic orbit in Qn) and poc c DI De defines a T invariant measure 


on (1), This is of course a stationary stochastic process on a) and since 


13 


gln) Q this is a stationary process on 2. In this manner for each n and w we have 


defined a stationary process 


1 
Raw = pin) * 9, (n) * *** * Sint ay? 


If g(x,,x5) is viewed as a map from R > R then 
[ac xdg - Tg x2) + ese βίΧῃ-τνχῃ) * gsx. 


This is not quite the same as what we have in (3.1) but the difference is just one 


term in n and becomes negligible as n » e. The ergodic theorem again tells us that 


P[iim Ry = PJ- 1. 
nro 


We might as well expect a large deviation principle for the distribution ûn of Rn vw 
under P. Now the state space is the space of all stationary stochastio processes 
and we expect a rate function H(Q) for Qe Ms which is equal to zero only when 
Q= P. There is of course a natural map from Ms + M which assigns to any stationary 


process its common one dimensional marginal distribution. If we call this map τ 
τβῃ ὦ * En m Mg, + «»» by ) 
n,u = 5n * gtx a, 
Since map τ is continuous from Μο M the contraction principle applies and we can 


have a large deviation principle in M if we have on in M, . The rate functions are 


of course related by 


I(u) = inf HQ). 
Ω:τῶ-μ 
Of course to actually carry all of this out requires serious assumptions on the 


nature of the underlying stationary process P. We will, during these lectures, 
Start with the special case of independent random variables or product measure for 
Ρ. Then, we will look at the case of a Stationary Markov Chain. We will also look 
at Stationary Gaussian Processes. We will then extend these results to the case of 
continuous time Markov Process. Towards the end we will look at some applications 


of the theories developed here. 


Section 4. Independent Random Variables. 


Throughout these lectures, in all instances the rate functions will have a 


14 


close connection with some sort of entropy. It is therefore important for us to 
spend some time establishing some of the properties of entropy. 
Definition: Given any two probability measures B and a on a measure space (Χ,Σ} we 


define the entropy of 8 with respect to a as 


(4.1) h(8;a) = sup [/V(x)d8(x) - logje O2 aa(x)] 
Ve Bg 


where By is the space of all bounded measurable functions on X. 
This definition is the same as relative entropy or Kullback-Liebler information 
number: This is the content of the following theorem: 
Theorem 4.1. The following two statements are equivalent: 
a) h(8,a) = 2 € v. 
b) 8 << o and if f(x) = 36 
da 


then f(x)logf(x) is integrable with respect to a and 
Sf(x)logf(x)da(x) - £ . 


Proof: Let us first assume that Ὁ) holds for some finite 2. Then using 


xy < ylogy + eX"! valid for x real and y > 0 
ÍV(x)d8(x) = fJV(x)f (x)da(x) 
< Jt GOlogfG)da(). + L seVOaacx) 
we can write V(x) = (V(x) - κ) +k. Then 
SVCx)dB(x) < SP (x) loge (x)dalx) + ο” KAD) jeVOOaq(x) +k. 
We pick k = logfeY(*)ga(x) - 1. This yields 
JVGOdB(O «4 + log SeVOOda(x) . 


Since V is arbitrary we establish a). 
Let us now assume that a) is true. First we want to show that B << a. let A 


be any set. Take V(x) = KXQ(x). We get from a) 
k8(A) < & + log(1 - o(A) + a(A)e*) 


or 


1 


BCA) < C£ + 1og(1 + a(A)(eF - 00] 


xl 


< inf Ute + 1og(1 + a(A)(e* - 02] 
k»0 
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5 ψίᾳ,α(Α)) 
where 


(4.2) w(2,6) = inf Ug) + log(! + (eK = 1))]. 
k»0 


it is clear that (24,6) > 0 as 6 + 0 for each fixed £ < e so that not only is 8 << a 
but any time £ is controlled such β are uniformly absolutely continuous with respect 


to a. We now have 


Sup fV(x)f(x)da(x) < & + log !οὐ(α)ᾳα(χ) . 
γεΒρ 


We would like to take V(x) = logf(x) to otain b). But we do not know that 
f(x)logf(x) is integrable with respect to a. In any case we may only take bounded 


V. We pick 


Vek = logl(fAk)Ve] . 


We let c + 0 and then k + e. Since flogf is bounded near f = 0, letting e + 0 is no 
problem. Finally as k + e we use the monotone convergence theorem to establish our 
result. The details are left as an exercise. 

Sometimes when we are dealing with a polish space X and its Borel sets for I, 
it is convenient to have the following lemma: 

Lemma 4.2. If X is any Polish space the supremum in definition (4.1) can be taken 
over the class of bounded continuous functions and we will still have the same 
Supremum as over all bounded measurable functions: 

Proof: a trivial application of Lusin's theorem. 

Suppose we have two probability measures 8 and a on (Χ,Σ) and a sub g-field 
fg CE. Then we may just look at β and a on Σ0 and restricting V to be bounded and 
measurable with respect to Σο we obtain what we might call hy (Bsa). In other words 
the relative entropy is also with respect to a specified o-field which may only be a 
sub o-field. Obviously if Z4 C I> then hy (872) & hy (8:9). We want to interpret 
the difference again as an entropy. Let us suppose that 8 and a possess regular 
conditional probabilities &, and αμ given the o-field Z4. Then 


Lemma 4.3. We have the following identity 


hy (Ba) = hp (Bo) + EP hy (Buray) ; 


16 


Proof: We can assume without loss of generality that hy (8,0) < œ. Otherwise 
Ay (8,0) > hy (8, α) = e and the identity is valid because both sides are infinite. 
If ny (Bsa) «e then B << a on Σι an therefore a, is defined not only almost 
everywhere with respect to a, but B as well. Therefore the second term on the right 
makes sense. 


For any V bounded and Z5 measurable 


εθ[γ(κ)1 = εβε[ωγ(χ) 


l^ 


a 
EB[logE 9eV(X)3 + EPhz (Buray) 


l^ 


α 
hy (Ba) + log ETE "eYOO + gbh, (Buray) 


log ESeV (X) + hr, (Bra) * EPh;, (Buray) 


This proves one half of the lemma: As for the other half we note that if 8 < a on I5 


then 
dg dg aby 
dal E2 E Sal ` da 1Σο | 
w LJ 

Let us now suppose that we have a product measure P on Q = I X; with marginal 
distribution a on each Xi which are copies of X. Let Q be any stétinay process on 
ü. We denote by Fh the o-field generated by the coordinates x, of wea for 
n<ic<m. If m= © the g-field is denoted by ΕἼ and if n = -e by Fy. We denotely 
ο, the regular conditional probability distribution of x4 given Fo under Q. The 
rate function that will play a role here is 
Definition 4.4. H(Q;P) = Eh (Ωρα) . 

Although the iod. H(Q;P) is defined through entropy it has several 
variational formulae as well and we need them in order to establish some of its 
properties as well as in the proof of the large deviation principle. 

Let A, be the class of bounded measurable functions on x(n) m= Xxeee,xX 1.6. 
functions F(x1,°**,X,), satisfying the condition 


| expLF(xi,***,xg))da(xp) < 1 ¥ x4,***,Xn-14 - 


χ 
Theorem 4.5. 


H(Q;P) = sup sup EQ(F)]. 
n FeA, 
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Moreover we can replace A, by the set of bounded continuous function C, of n 


variables instead and we also have 


H(Q;P) = sup sup ET[F] . 
n Εεζῃ 


Proof: Let Fe AQ. Then 


EME] = EQLFOG, x42] 


EEQ F(x], x4) | Fn-1] 


F(x ΜΠ ΤῸ] 
EM ogePre ^! nt 


|Fa-1] + H(Q;P) 


H(Q;P) . 


lA 


To establish the reverse inequality we define Q on the o-field ΕΙ by making Q = Q on 


Fg and making the first coordinate independent of the past Fo and having 


distribution a. Then Qs = qa. Then we see that 


H(Q; P) 


hg, (QQ) 


sup EQ(FJ - logE%eF] 

FeF, as 

= sup sup EF] - log Ete] 
n FeF(x 5, ***,X 4, X9,X4) 

(by a variant of Lusin's theorem) 


*X4) 


F(** 
sup sup ΕἼΓΕ} - EQ[1og [6 
n ë FeF 2 
= sup sup ΕΏΓΕ1 . 
n FeAn 


Another way of calculating H(Q,P) is by the following Theorem. 


fA 


da(x,)] 


Theorem 4.6. 


H(Q;P) = sup 2, sup EUF] 
n FeD, 


where 


Ε(Χεν”'»,Χῃ) 
Dy = {ΕΙ F = Είκιν"'»,χρ) fe 77 ο dPC 1}. 


Proof: Let us call F(x4,***,xQ4) by Εῃ(Χῃ,“'',Χῃ) and define successively 


Frag X4, ** s Xka) 
FyO t. teXy) = log fe ΠΤΙ CET dau) + 


Then we verify that Fg < 0 and 


ο καὶ 1a tt Χα) = Fk(x4,**5,Xy) 


J da(Xk+1) ς 1. 


the 
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Therefore by Theorem 4.5 
ΕΕ - ESF, < H(Q,P) 

adding over k = 0,1,***,n-1! and using Fg < O we have 
Ede, = EQF < nH(Q;P) . 

To prove the converse we note that by definition 
sup E&[F] = hr eri 


FeD, n 
We compute 


δ (Q;P) - h 1 (Q,p) 
n n-1 


= h ο. (015) -h ;P) (by stationarity) 
r? οί ppa ) (by sta y 
= n (072) να) 
Fi 


As n+ the above term has a limit inferior of at least n, ur) = H(Q;P). 


i 
establishes the theorem: 


This 


We are now ready to prove our upper and lower bounds for establishing the large 


deviation principle. We have the measure Qn which is the distribution of Eno under 


P. We divide the theorem into many lemmas. 


Theorem 4.7. For any closed set C C M 


(4.3) limsup : log Qn (C) < - inf H(Q,P) 
no Ωες 


and for G C M, open 


(4.4) liminf 1 log În (0) > ~ inf H(Q;P) . 
n 2 
neo θεα 


Proof: Let us denote for any Borel set A 


(4.5) JCA) = limsup E log Qn (A) i 


no 


We will establish several properties of J(A) as lemmas leading up to the proof 


(4,3). The lower bound (4.4) will be dealt with later. 


Lemma 5.8. Let FO, sx € Cy. Then for n > ! 
1 
EP {expl (FG, στ. χι) 4 F(X5,** X44) 4 «ος F(xy,**sXga0422] ς Te 
Proof: We write 


FO, ux + eee F(X tt Χρ.) 


of 
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= Gi {Xj X2,*tt) + GolkyXo,ee8) + tt ΟΚ(Χ1 Χο, ttn) 
where 

G4 = F(x4,** ΧΚ) + F(X445*** Xo) + eee 

G2 = F(xo, χι) + ΕίΧκερν””"Χρκεὶ) * t 

Gk = F(Xy ,***Xok1) * F(xoj, ***,X3k-1) + eee, 
Then 

EP (expl CFOr v x) to Fn t*sXgak- 10021) 

= EP expl (οι + eee + οκ) 1) 


G G 
EPL Le Vices NEN 


lA 


LA 


is G 

1 y ete 43 <1. 

K -- 

1 

Lemma 4.9. For any Borel set A C M 


J(A) < - sup Ἢ sup inf EUF}. 


k FeC, QEA 
Proof: 
1 Rn ww k-1 
Iz LEG x2, xj) t ere + Fn ttsXga-))] τ E H < ^R ΠΕ}] 
where ||F|| is the sup norm on F. 
Therefore from lemma 4.8 
EPLexpLe CFOr , +X) + eee F(xy tts kya J] < C 
or 
Qn n 
E Lexp[7 /Ρ4Ω11 < C 
TN ^ inf/FdQ 
â K ΘΕΑ 
QCA) < ce ~ 


Taking logs, dividing by n and taking limsup as n > e 


JA) € - Linf fF dQ. 
QeA 


Since F e Cy, and k are arbitrary we have our result. 


Lemma 4.10. Let K CM be compact and let e > 0 be given. Then there exists an open 


set G, in M such that K © G, and 
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J(G.) < ~ inf H(Q;P) +e. 
QeK 


Proof: Let e > 0 and k be given. For each Q e K there exists an integer k(Q) and 


Fq € Cy(g) such that 
L ÍFgdQ > H(Q,P) - ε/2 
κα) απ ο, ρα 


Since SFQdQ is a continuous linear functional of Q, there is a neighborhood Ng of Q 


such that for Qt e Ng 


1 
Klay SFQdQ 2 H(Q,P) -ε 


Therefore from lemma 4.9 
J(Ng) < ~ H(Q,P) te. 


Ng as Q varies over K is an open covering of K and let Ng, e Ng be a finite 


subcover. Denoting by Ge such a finite subcover clearly Ge K and 
J(G,) < - Min J(Ng.) 
14j4L J 


<~ Min [H(Q;,P) - e] 
1«j4L 


< - Min H(Q5; P) +e 
1<jsL 
ς ~ inf H(Q;P) te. 
QeK 
Lemma 4.11. Given any αὶ < «e there exists a compact set Kg C M such that 
JKD i-i. 
Proof: For each Q e M let us denote by q the marginal distribution of xg. Then if 
Bo M is a compact set of probability measures on X. Then 
Ky = (Q:q € Bg} is compact in M . 


We need therefore only estimate 


Qn (0:q € Bg} 


or 
ôx, + ot δχ 
1 n 
Ρίω: ERIT EE € By} 4 


From Prohorov's theorem we can take 


By = {αια(ος) > 1 - J 


eo ee ae 
7 or 1 
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where D$ CX are compact subsets of X for each j and £. Therefore 
ôx 4 es’ ôx 
1 n 
Plo: ———— e Bj) 
n 
i Xyg,o 12 * t** xy o0? 
D7? pj: 


< ρω - > ο 
Js? X 

< I Pen) (by Lemma 4.12 with @ = 23°) 
1-2 


if we pick D$ such that a(D4) >1- antl? which is possible by Prohorov's theorem. 
If we assume ϐ > 1, then the lemma follows. 

Lemma 4.12. Let X1,°**,X, be n independent random variables with values 0 or 1 with 
probability £ and 1 ~ e. Then for any 6 > 0 

Kyte yx 


n 


Prob[ » 6] < ο Πθδ(εοῦ + (1 - eM. 


Proof: Apply Tchebechev's inequality to 


Xytere 


x 
Ele Nn] a (se? +1 -g)n , 


Proof of (4.3). Given any closed set C we pick an & < e and the comapct set Κι of 


lemma 3.11. Then Kp C is compact and 
J(C) < - Min[J(Kg n C), J(K$)] 


< - [inf H(Q;P),£] . 
Qec 


Letting £ + e we obtain our result. We now work on the lower bound: 

Let us denote by ψίω) the Radon-Nikodym derivative of Q, with respect to a on 
the o~field Fl. Then ψίω) can be thought of as a measurable function on F4 and 
ΕΘ] ogy(w) = H(Q;P). Moreover if we denote by T the shift on the space of sequences 
then = on Fj is exply(w) + ψίτω) + eea]. 

Lemma 4.13. Let Q be any ergodic element in M. Then for any neighborhood N of Q in 


M 
liminf L log ÔN] > - H(Q;P) . 
n»e 


Proof: Assume H(Q;P) « e. 


Q,(N) = PERn y € N) 
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Iv 


dP 
| E dq 
Rn we 
- (1ogu(u)** --1ogy TA 1w)) 
: | * dQ 


-n(H(Q,P)*6) 
> | ^ dQ 
Ry EN n Eg 


1 


n 
à D log (TJ) < H(Q;P) + 8). By ergodic theorem Q(u:Rp,,, € N} as 
j= 


well as Q(E.) tend to 1 as n + » and we are done: 


where Eg = {w: 


Lemma 4.14. If Qe M is arbitrary then Q has an integral representation 


a= f angan 
Me 
over the ergodic measures and 


H(Q,P) = [ W(Q",P)mg(dQ") . 


M 
e 
Proof: From standard results in ergodic theory we know that the integral 


representation is valid. Moreover the regular conditional probability has a version 


θῳ Such that 
Ow 7"Q;aeoQ eM. 
Therefore 
H(Q;P) = [5g (usa 


clearly satisfies the lemma: 

Now we prove the lower bound (4.4). If Q&M is arbitrary we can approximate 
it by a finite linear combination of ergodic ones such that Q - En 5Qj and 
H(Q;P) - EnjHJ(Q;,P). We can therefore assume without loss of generality that 
Q= Σπ161 with Qj ergodic. Let n be given and define nj = mjn. For a given w = wy 


n4****ni. 
let wj = T 1 j p" 


1: Then Ry, - Σπ Μην a, since the only difference is the 
periodization at the end. Since the topoloogy on M is essentially convergence of 
finite dimensional distributions for a given finite dimensional range the effect of 
the periodization goes to zero as n > e. Therefore given a neighborhood N of Q, 


there are neighborhoods N; of Q5 such that 


j 


ε Nj for V3 => Rn yw ΕΝ. 


δι 
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Therefore 
PER, € NJ > R PRI uj € Nl. 
Taking logs, dividing by n and taking limint as n » » 


OS Φα 
lim — Q,(N) > - Ev,H(Q4,,P) > - H(Q) - € 
--- π z J 1 νά 
Niro 
and we are done: We want to conclude this section by stating some properties of 
various entropy functions: 
Lemma 4.15. For fixed a, h(8;a) is a lower semicontinuous convex function of f in 


the weak topology. 


Proof: 


h(8;a) = sup [/Fd8 - logfefda] . 
FeC(X) 


The properties are now obvious: 
Lemma !.16. For fixed P, a product measure based on a, H(Q;P) is lower 
semicontinuous in Q: 


Proof: 


H(Q,P) = sup sup [/FdQ] 
k FeCy 


and the lemma {8 obvious: 


Lemma 4.17. For fixed 8 


inf H(Q;P) = h(8;a) 
Q:q»8 
and the inf is attained at the product measure: 


Proof: It is clear that for the product measure Qg 


H(Qg;P) = h(B;a) . 


F(x4)****F(x,) 
Cy contains functions e 1 τ with SeF () da(x) < 1. Therefore for any Q, 


H(Q;P) > h(q;a). 


Lemma 4.18. For any & < e and any a 
Íq:h(q;a) < 2} is compact in M. 


Proof: From inequality (4.2) if a(A) < ὃ then q(A) < n where n = n(6,2) » 0a8 6 + 0 
for each 3. Since α is tight by Prohorov's theorem {q:h(q,a) < &} is uniformly 


tight and hence compact. 
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Lemma 4.19. For any &, {Q:H(Q,P) < 2} is compact in M. 
Proof: As Q varies over our set q varies over a set contained in í(q:h(q;a) < δ} and 


is therefore conditionally compact. Therefore so is the set of measures Q. 


Section 5. Markov Chains 

In this section we will assume the base measure P to be based on a Markov chain 
rather than a product measure. One difference however is that the P measure will be 
defined only on F° which is all that is needed. Moreover instead of a single P we 
have a family m depending on the starting point xg at time zero. The transition 
probability is 1(x,dy). We make the following assumption on n(x,dy). 
Hypothesis 1. 


"r(x,dy) has the Feller property or for any bounded continuous function f(*) on 


(nf)(x) = ff(y)n(x,dy) 


is bounded and continuous on X. 

The random processes Ry, are defined as before and instead of Qn we now have 
d, κο depending on the starting point xg as well. We will describe the results in 
this case and indicate in the proof only modifications needed in the earlier proof 
for the independent case: 


Definition 5.1. Given m and Q £ M we define 
Β(Ω;π) = Ens γ(θμ,πίχο, 1) δ 


Here xg is thought of as a function of w and then the relative entropy of Q, and 
πίχρ,») is calculated on the o-field Fi corresponding to x4. The answer that 
depends on w is averaged with respect to Q. 

In theorem 4.5 we should modify A, so that 


Ap = (F:F = F(x4,***,Xx5) and | expLF(x4,***,xg) In(xg-4,dxq) € 1. V y 


X 
Cy is defined accordingly: 


TL 


Then we have 


Theorem 5.2. 


H(Q;v) = sup sup EQLFJ 
n Fea, 
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= sup sup EMIF] . 
n FeC4 


Proof: 
Proceeds in a manner identical to theorem 4.5 with minor obvious modifications. 


We next define D, as 


F(x1 P ***,Xp) 
Dg = (F:F = F(x4,***,x5) and fe άλσος 1 ¥xg} . 


Then we have the analog of theorem 4.6. 


Theorem 5.3. 


H(Q,P) = sup L sup ΕἼΓΕ]. 
κ 
κ * Fed, 


Proof: We define for given F = F(xy,***,X,), Fy = Fk(xi,:**,X4) successively for 
k< nby 


Fk (Xp tees Xka 


) 
Fu (xy ***,X&Q) = log Je TQ dX 44) . 


Then ΕἼΓΕ, η - Ἐκ] < H(Q,m) by theorem 5.2. The proof is completed as before. Now 
we start establishing the large deviation principle. 


First we define for any Χρ ε X 


1 A 
(5.1) Jx (A) = limsup - log Qn xo 6A) x 


n+% 


We then have 
Lemma 5.4. For any compact set KC M and any e > 0 there exists a neighborhood 


Ge K such that 


Jx (Ge) E ni HQT) τε. 
€ 


Proof : 

Identical to lemmas 4.8, 4.9 and 4.10. 

The main difference is only at this point. In order to go from compact K to 
closed C we need to make a strong positive recurrence assumption on the transition 
probability m(x,dy). 

Hypothesis 2 

Let us suppose that there are functions U(x) and V(x) on X with the following 

properties: 


a) U(x) > 1 for all x and (xU)(x) is bounded on compact subsets of X. 
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b) v(x) = logu(x) - log(sU)(x) is bounded below (away from ~e) and for any ἃ 
{x:V(x) < &} is a totally bounded subset of X. 
Under hypothesis 2 we can establish the analog of lemma 4.11. 


Lemma 5.5. Given any αὶ < e, there is a compact set Ko M such that 
Jx KẸ) «-ᾱ for every xp eX. 


Proof: The proof of lemma 5.5 will depend on lemma 5.6 and will follow the lines of 
the proof of lemma 4.11. 


Lemma 5.6. Given any $ and j there is a compact set D$ X such that 


al "m 1 ng-£&nj 
xo etg Urs, ο στ) * pte *n)] > 3! s ce 
for all j and n. Here C is some fixed constant. 


Proof: From hypothesis 5.5 we have 


P γ ...Υ 
g'xo qq p)" ση ρα.) = sU(xg) . 


Since U > 1 and sU(ng) < e we have 


εἶ Χρ[ο 1) + eee VO), «ο. 


If we take p^ = {x:V(x) < A} for some à then 


n 
Υ(Χῃ} + eee VOg) >a È X4, 90:21) - neq. 
r=] pj 


where -C, is a lowerbound for γί»). Therefore 


Px n n 
E expla J Xi, o 01] se. 
pn r=1 J 3 àn 
Therefore Py [~ ) X οίκρ) > τὶ < cle Fa 
on r=1 51’ J 
If we choose λ = 1j? in p? we have our estimate. If we therefore have 
hypothesis 1 and 2 we can get the upper bound part of the large deviation principle. 
Moreover one can check through the proof that all estimates are valid uniformly 
provided Χρ varies over a compact subset of X. Now we start working on the lower 
bound. We need another hypothesis. 
Hypothesis 3. The transition probability n(x,dy) has a density m(x,y) with respect 
to a reference measure a such that 


a) m(x,y) 20 a e a for each χε X and 


b) the map x + πίχ,») is continuous as a map of X into Li(a). 
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Theorem 5.7. Let Q € M be ergodic then for any open N containing Q 
1 
di g 198 XL M e€ N and x, € K2] > ~ Βί(Ω;π) 


where Ko is any compact set in X with a(Ko) > 0; and the limit is uniform for xg 
varying over any compact set in X. 
Proof: Let us pick Κα such that q(K3) 2 i where q is the marginal of Q. Denoting 
by 

dQ 


με. 
oP x (uw) |F] 
where Χρίω) is the coordinate of w corresponding to zero we have 
n-1 
m on θχρ[ 1 log y(TIw)] . 
olw) |F} j=0 
Therefore if we take the set 


= plo) 


ΕΝ, E fw: Rn vw ΕΝ and Xn (w) ε K3} 


then 
n-1 . 
Pxo(o) (ΕΝ,π) 2 | expl 7 105 y(T)0))dQ, 


3-0 
EN. n 


n3i 
2 αλ πι Ω {ω: I | log vo) < H(Q;P) + δ}1 . 

Let us denote by $(n,x) the quantity 

$(n,x) = Py(Ey Del HP) *80n. A 1 
Then 

$(n,xo(u)) > Q,LEg, n ^ Dn, 
where 

1 ns ; 
Dn, = (o: = È log v(TJo) < H(Q;P) + δ} . 
1 


Taking expectations with respect to Q we have 


(5.2) fe(n,x)dq(x) > QLEy,n 0 Dy g] 


1 


> q(K3) > 5 ano. 


Therefore 


liminf f$(n,x)dq(x) > 


ne 


wm 
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We can find a smaller neighborhood N, C N such that if Βῃ-2 τω € Ny then Rg,, € Ny 


for n sufficiently large. Therefore for large n 


Ρχρίβηνω € Ν,Χῃ © Κρ) 


Iv 


Ρκρίδη-2,ω € N, Χῃ-ιε Κα, Xn € Ko) 


ats Px (EN, neg) r(xg,dx))inf πίχ,Κο) . 
xeK3 
From our assumptions the last factor is strictly positive. We denote by ϐ some 


lower bound for it. Then 
Pxo (Rn ww € N, Xp € Ko) 
> 8 e^ LHP) *87n. fato x )n(xo,dx) . 
It is now an elementary exercise that our assumptions and (5.2) imply that 


liminf f$(n,x)m(xg,dx) > 0 


neo 


and in fact uniformly over compact sets of starting points Χο. We finally have 


Theorem 5.8. For C closed in M and G open in M 


limsup 


nes 


liminf 


no 


108 Qn x (C) € - dd R(Q;P) 
€ 


108 Qj y (8) 2 - inf H(Q,P) 
ε 


Proof: All that remains is to pass from ergodic Q to non-ergodic Q. This is carried 


gj- 3i— 


out exactly like the independent case. Instead of independence we make in each one 
of time periods the process to have its Rn yw closet to Qj and end up in a compact 
set Κρ, Since we can afford to take the infimum over the starting point in Χρ e Ko 
at the next step it is almost the same as independence. 

We also have the results, which are analogs of 4.15 through 4.19. 
Lemma 5.9. For each fixed m, H(Q;7) is lower semicontinuous in Qe M. 


Proof: 


H(Q;7) = sup sup [fFdQ] 
k FeC, 


and the Feller property ensures that the normalization procedure that defined Εκ 
inductively leave the {Cy} invariant. Since the functionals on the right are 
continuous we have lower semi-continuity of Η(01π). 


Lemma 5.10. For any B eM 
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inf H(Q;P) = inf h (A 2λ0) 
Q:q-8 rem§?) ee 


where λ,λ are probability measures on XxX and Ag(dx,dy) = B(dx)m(x,dy). μ(2) 
consists of all λε 4 such that the marginals of both components are p. 

Proof: Starting from à we can construct a unique Markov chain (stationary) whose two 
dimensional distribution is A at two consecutive time points. For such a Markov 


chain Q, one can compute 
H(Q) ;P) - hy (2) Aiko) . 


If Q is not Markov then the Q associated to the two dimensional marginal of Q is 


always Markov and 
H(Q;P) < H(Q;P) . 


Lemma 5.11, For any B eM 


(x) 
inf H (4,39) = sup f log Taras dg(x) 
ied? x(2) u 


where the supremum is taken over all bounded uniformly positive measurable 
functions: 

Proof: For the proof we do not need the Feller condition on m so by a standard 
result on Polish spaces we might as well assume that X is compact. Then we may 


restrict u to bounded continuous functions. 


u(x) 


CADET d&(x) = £. Then, 
f log cS \(dx,dy) = 2 because A € μ2) 


Suppose for some u, J log 


on the other hand 


log f το Ag(dx,dy) = log f as dg(x)n(x,dy) 
= log 1 
=O. 


By definition of Hy (2) 0330) we have now 
es i 
he M2 => n (2) 0530) > 8 
and we have the easy half. For the other half what we have to show is that if 


(5.3) inf h (4:19) > 2 
(2) xf) Ξ 
AeMg 
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then we have to produce a continuous u for which f log ay dg(x) >A - e, where 


ε > 0 is given. this requires the use of the minimax theorem. From (5.3) we have 


inf sup [/V(x,y)A(dx,dy) - log Je VOY) ys (ax, ay)] ^£. 
AeMÁ?) v x 
β 


By standard minimax theorem we can interchange sup and inf so that 


sup inf | [JV(x,y)a(dx,dy) ^ log feY™Yag(dx,dy)] > & . 
v AeM$? 


In other words given e > 0 there is a V such that 


inf = fv(x,y)A(dx,dy) > α + log feV OY) ag (ax, ay) πε. 
rem? 7 


By normalization we may assume the existence of a V such that 


(5.4) 4 + log f eV O59) Ag (ax, dy) Ke 
and 
(5.5) SV(x,y)Mdx,dy) 20 Υ A c Mi?) 


We may rewrite (5.5) as 


inf sup [fV(x,y)A(dx,dy) + J[o(y) + p(x) JA(dx,dy) - ![Φίχ) + p(x) 1e(dx)] > 0 
A $y 


because the sup 18501 λε mg?) and œ otherwise. Again by minimax theorem (5.5) 
implies 
(5.6) sup inf [fV(x,y)A(dx,dy) + J[e(y) + w(x) JA(dx,dy) - SE g(x) + p(x) ]dB(x)] > 0 


oy à 
which means that given any & » 0, there is pair 4,y such that (again by 


normalization) 

(5.7) SdB = fydg = 0 

and 

(5.8) V(x,y) > o(x) + oly) -ε V x andy 


(5.4) and (5.8) yield 

(5.9) & + log J eo") *V OD B(ax)a(x,ay) < 2e . 
If we call eU = u then (5.9) is the same as 
(5.10) log f e$ (qu)(x)g(dx) «2ε- 2. 


By Jensen's inquality we get 
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J e(x)8(dx) + f log(m)(x)B(dx) < 2e - 4 


since log u = y from (5.7) we obtain 


x 


J log m (dx) > & - 2e 
and we are done. 


If we define 


I,(8) = sup f log as B(dx) = inf H(Q;m) 


u Q:q=B 
then 
Lemma 5.12. Ιπίβ) is lower semi-continuous and convex. Under Hypothesis 2 the set 
(B:1,(8) < 8] is compact in M. And under the same hypothesis ({Q:H(Q;m) < 2} is 
compact in M. 


Proof: By standard truncation we will have 
SV(x)dB « & 


where V(x) is the function of hypothesis 2). By Tchebyshev bounds we obtain the 


first part of our lemma. The second part follows trivially from the first part. 


Section 6. Stationary Gaussian Process 
For P we take a stationary Gaussian process with mean 0 and covariance 


27 
ElXnXpa) = Pk = L elK8r(5)aqg 
0 


where f(8) is a continuous nonnegative function with f(0) = f(21). We assume that 
Qn 
the process is nondeterministic so that J logf(e)dé is greater than -», 


κα 0 
We construct Rn ow and Q, and we aim to show that a large deviation principle is 


valid for Qn with a rate function 


(6.1) H(Q;f) = ga aly/w)log a(y/u)dy) 


=w 


2T 2T 
1 1 | do(oe) 1 | 
to logf de 
2-198 ^ raul ει TO Ng iR 
0 ο 


where dG(6) is the spectral measure of Q i.e. 


Exon, = πε feiKSag(0) 
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We will outline the basic steps involved in the proof of the large deviation 
principle for Qn with the rate function provided by (6.1). 


Step 1. We represent the random process {xn} as a moving average of the form 


X= } Αποκ 


n»-o 


where — are independent random variables and 


ο 


Vfl) = Y apei. 


nsx-o 


The sequence {ap} is in &o(z). 
Step 2. We approximate a, by aN = ag(1 - Inl; for |n| < N and aN = 0 otherwise. If 


we write 
o 


By(e) = [I aNeine 


nano 


then gy(0) + /f(8) uniformly by Fejer's theorem: 


Step 3. Let us define a map on 3 = ΠΒ by 
lez 


(tw)(K) = Σ ap uo (n) 
then P = Pot! where Pg is the product measure based on standard Guassians. If we 


define ty by 
(tyw)(k) = z aN poln) 
then Pu = Poty. is Gaussian with mean 0 and spectral density 
fy(s) = ROI . 


Step 4. For each N, R and Rute are very close, as n + œ, In fact any 


N,tyw 
difference between them is only due to periodization. They are both random 
Stationary process and the large deviation principle for Rn wth implies the large 
deviation principle for Βῃ,τχω. Moreover since we have a large deviation principle 
for Βπιω When the basic distribution is Pg, we have one for Rn wth since Q + Qty! 


is a continuous map of M into M. The rate function for Rn, tyw whose distribution we 


call δα x is given by 


inf H(Q',1) . 
Qt 1ο τε] =Q 
Step 5. We calculate 
inf H(Q*,1) = H(Q,fy) 


Q*: Q'ty =Q 
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H(Q,f) for any f is given by formula (6.1). Step 5 is mainly a 
calculation. 


Step 6. 


lim limsup ξ 106 Po {4 (Bg , «us τω) 2 e} = -- 


Neo neo 


for every & > 0. This is again a calculation based on routine estimates for 
TNO 7 τω. 


Step T. 
Qn[C] = PERp o € C] = Po[ Ra, py ε CJ 
« Ροΐ Ra, tyw ge PoL a (Ry cu s Rn tw) Σε]. 


Taking logs dividing by n, taking limsup and then letting N + » first and then 


€ > 0, we obtain 


limsup L log ἂρίο) < - lim liminf inf H(Q;fy) 
nro ε»ο Neo Qec, 


and similarly for G open 


Liming + 10g Q,(G) > - liminf inf H(Q,fy) 
n»o Neo QcG 


Step 8. 


lim liminf inf H(Q;fy) > inf H(Q;f) 
ε»0 Noo QeC. QeC 


lim inf H(Q,fy) < inf H(Q;f) . 
Nee QeG QeG 


These two statements are proved by the explicit formulas for H(Q;fy) and H(Q,f) and 
the explicit definition of fy in terms of f. Finally 


Step 9. H(Q;f) is a rate function. 


Section 7. Continuous Time Markov Processes 

We will now assume that we have a Markov process with transition probabilities 
p(t,x,dy) on a state space X with the following properties: The state space X is 
Poiish. Moreover: 
Hypothesis 1. The semigroup (Ty Pf) (x) = ff(y)p(t,x,dy) maps bounded continuous 


functions C(X) into itself. For any starting point the measure P, on the space of 
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trajectories lives on 90 = D[0,9) which is given the topology of Skorohod 
convergence on finite intervals. The map x > Py is continuous. 
Hypothesis 2. There exists a sequence ug of functions in the domain D of the 
infinitesimal generator of the process with the following properties 

a) ug(x) > 1 


b) sup sup un (x) < æ for each compact K X 


xeK n LU, (x) 
ο) lim un(x) = u(x) exists for each x. If Vp(x) = —~-———~ then 
neo Un (x) 


d) Vg(x) > -C for some C for all n,x. 
e) V(x) = lim vq(x) exists 
noo 

f) for each 2 « e, (x:V(x) € &} has compact closure in X. 

Hypothesis 3. p(1,x,dy) has density p(1,x,y) for every x with respect to a 
reference measure a on X. Moreover p(1,x,y) > 0a e a for each x. In addition the 
map x +p(1,x,°) is continuous as a map of X into Lí(a). 

We denote by R the Skorohod space D(^5e,9) and by M the space of stationary 
processes on 3. M is a Polish space under weak convergence of processes on finite 
intervals and the projection map w »u(t) while not continuous in general is 
continuous at almost all points with respect to every Q e M. [Q has no fixed points 
of discontinuity] for each w e Ro we define Rt ww by the continuous analog of Rnw" 
We extend the trajectory w(s), 0< s< t periodically on either side to get a 
periodic orbit under the shift θα of period t and take Rt ww as the orbital measure. 
For ech x e X we have the distribution à x of Rt w under Py. We are interested in 
a large deviation principle for Q, κ on M with some rate function H(Q). We will 
suppress the dependence of H(Q) on p(t,x,dy), which will be a fixed semigroup for 
our discussion. 

The proof follows the discrete case very closely and we outline the proof 
giving details only where there are new aspects in the proof. 


Definition 7.1. Given Q e M we define 


H(Q,T) - Ε προ (Qoo (o)? ; 


Lemma 7.2. 

H(Q,T) = TH(Q) for some 0 < H(Q) < œ. 
Proof: One checks by stationarity of Q and p(t,x,dy) that 
H(Q,T4 + To) = H(Q,T,) + H(Q,T2). Since H(Q,T) > 0 it follows that H(Q,T) is linear 


in T. 
For each T we define Ap by 


Ay = {F:F is Fo measurable and 


Ρ 
E *fexplF(w)]} <1 v x) 


Cp = Ap (F:EQ(F) is a continuous linear 


functinal of Q in M}. 
We then have 
Theorem 7.3. 


H(Q) = sup | sup EQ{F} = sup T sup EQ(F) . 
T^0 FeAp T»0 FeCp 


Proof: Same as theorem 5.3. 


We can define 


z P 
Br = (F:F e Fy! and E (0) gF(u) < 1 everywhere } . 


Ρω(0) 0 


In the above definition integration with respect to E is carried out over Fj 
only on each fiber of Fol. We also have the analog of theorem 5.2 proved in exactly 
the same manner. 


Theorem 7.4 


H(Q) = sup sup ES{F} . 
Τ»0 FeBp 


We now start proving the large deviation principle. 


We define for Xg € X and AC M 


x 1 ^ 
Ix C9 = limsup q 198 ara Ue ἑ 


Tre 
We then have 
Theorem 7.5. For any compact set KC M and any ε > O there exists a neighborhood 


Ge DK such that 


Jx (Ge) < -inf H(Q) τε. 
QeK 
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Proof: Identical to the discrete case. 

One main difference in the continuous case is that processes whose marginals 
vary over a compact set is not necessarily from a compact set of processes. We need 
to control the modulus of continuity as well. 

Theorem 7.6. Let A closed in M be such that the family of one dimensional marginals 
of Q as Q varies over A forms a tight family of measures on X. Then 
Jyg C9 < - inf HQ). 


QcA 
Proof: Let us denote by Ay the family of marginals of A. Given any sequence εῃ > 0, 


there exists Ky CX such that q(Ky) 2 1-7 ερ for qe Ay. Since xg > Ρχο is weakly 
continuous there exists Cy C DLO, 1] such that Cp is compact there and 
Ρχ(6ῃ) > 1 ^ m for all κε Ky. Denoting by ὃς the comnplement of Cp it is easily 


checked that for all xe X 


P 
E Σ{χρ[νΧκ (w(0))X~ (ω)} <1 πῃ(ον - D . 
n Cn = 


From the continuous analog of lemma 4.8 


P 
E" texpt X, (ω(ϱ)) X. (80)1) < ext Log(1 + mle” - 023. 
n 


0 
Therefore allowing for an error of τ for periodization 


1 


à, x, (A n (ΩτΩ(δῃ) > τ’ 2 επ) 


l^ 


exp[t 1ορ[1 + ny(eY - 1)] - eqvt] . 


1 
Pick 4 > 0, v= λαξ, εῃ τ and m = εχρ[-λης]. Then 


Ο EA 1 2 tlog2-Ant 
Qt xo 1^ NAC PEt sh ce , 
If we let 
A "i 1 2 
t = QAC) < Ἐ ποτ all n> 1} 
then 
^ » At 
Qe y (An A) < et1og2( 5 i 
Xo ES κος 
Therefore 
(7.1) limsup = log t x9 6A n Ac) «1082-2. 


to 


It is easy to check tht A, - NAg is compact in M and if G2 AN A, is open that 


37 


Ας NAC G for t sufficiently large. Theorem 7.5 and 7.1 provide a proof of Theorem 


7.6. If we now assume Hypothesis 2) one can obtain easily 


P 
Ε Χο[οκρ [ v(w(s))ds] < C 


0 
for every t > 0. With this estimate the proof of the upper bound for closed sets 


proceeds exactly like the discrete case. 
Lower bound: The only essential difference with the discrete case is the ergodic 


theorem: If 


dQ 


ο κα 
GP, (0) 
then p(t + s,w) = j(t,u) + ψίδνθεω) is an additive functional. To establish the 


[ερ = exp(y(t,w)] 


ergodic theorem almost everywhere we need to show 


ES sup |y(tw)| «9. 
O«t«1 
This is the content of the following lemma: 


Lemma 7.7. Let α,β be two probability measures on a measurable space (X,F). Let 


Fy O< t <1 be an increasing family of subfields with ΕΙ = F. Let Πίβια) < « and 


Lj dB 
ων) = log = ΕΕ’ Then 


ΕΒΓ sup  |u(o,t)|]1« e . 
O«t«1 
Proof: From standard martingale inequalities 


Blw: inf q(u,t) € -ᾳ} = βίωι inf R(tyw) < et} «et. 
O«t«1 O<t<1 


We therefore only have to show 


ΕΒΓ sup ylw,t)] <œ. 
Oct«1 
But from entropy inequalities it is sufficient to show that 
ΕΓ sup eU 6st] ς α 


Oct 
or 


ES[ sup Βίξιω)] «5. 
O<t<1 
Since R is a martingale we need only the integrability R(log R)' which is of course 


true because h(8,a) < e. Now the lower bound is completed as before and we have 
Theorem 7.8. Under hypothesis (1), (2) and (3) a large deviation principle holds 


with the rate function H(Q). In particular H(Q) is a rate function: 


We can define for u € M 
I(y) = - inf | Œ (x) u(ax) τ 


urd 
ueD 


Assuming that the domain is big enough we have if 


Tpu 
In(u) = ~ inf | log oo (x) u(dx) 
ud 
then 
InGo < h I(p) 
and 


1 
lim t Ipu) = I() . 
η»0 


We can use this to prove the contraction principle: 


Theorem 7.9 


Inf H(Q) = I(y) . 
Q: q»u 
Proof: If we denote by Q, the Markov chain at times that are multiples of h, with 


invariant measure which is obtained in lemmas 5.10 and 5.11 then the entropy of Qu 
with respect to the basic πῃ Markov chain with initial distribution u is given by 
In(y) per time gap h. Therefore on the h "grid" in the unit interval 
h(p, P, n) < E In(u) < Ι(μ). We can fill in the gaps of the grid by the conditional 
distributions of bridges of span h. This leads to the measure Pu as filled in Push 
and some measure Qo for filled in Q, : h(Qh, P.) - h(Qn, P, η) < Ilu). It is easy to 
Show from this entropy estimate the tightness of Q} and the limit Q is stationary, 
has marginal u and H(Q) < Ι(μ). The other half of the theorem is just as trivial as 
the discrete case. 

Finally if p(t,x,dy) has a symmetric density p(t,x,y) with respect to a 
reference measure a then 
Theorem 7.10. Ily) < e if and only if y << a and if f = g then vf e Lo(o) is in 


the domain of (2L) 1/2 in Lo(a). 


IG) = [| C0 V2vr||? . 


Section 8. Application to the Problem of the Wiener Sausage 


A problem that comes up in the study of density of states for Schrödinger 
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operators with certain random potentials near the edge of the energy spectrum is the 
following: 
Let (t) be d-dimensional Brownian motion starting from the origin. Let e > 0 


be fixed. Consider 
Cy = {x:x = 8(8) for some 0 <¢ s¢t}, 


CE = { U s(x,e)} = {x:[x = B(s)| «e for sone OK s & t) . 
xeCy 


Cc is just the image of the Wiener path up to time t, and c£ is the sausage  arount 
it of radius e. 


The problem is to show that 


(8.1) lim log pte VICEL} = -k(d,v) 


1 
ACTA) 
exists and is nonzero. Here |C£| is the didimensional volume of the sausage C£ up 
to time t. The actual physical problem involves the Brownian motion that is 
conditioned to return to the origin at time t. But for large t, one can see easily 
that the difference between the free Brownian motion and the conditional Brownian 
motion is small enough that the formula (8.1) is unaffected by it. We will study 
only the free Brownian motion. 

We will first carry out a Brownian change of scale so that τ4/ (41) appears 
naturally 


Let us replace g(s), O< s< t, by 


ti/(d*2)g(q72/(0*2)8) for og sg (472) 


which is again a Brownian motion. Therefore the distribution of |Cf| is the same as 


731/(d+2) 
that of td(d*2)|cst 4) 5) |. If we let t = tY/ (4*2), then 
-v|c£ 41/d 
Ele | th = Ef{exp{-tv|c&t [1]. 


The problem therefore reduces to showing that 


1/4 
(8.2) lim t log E(exp[-vt|ct* ΄ [1} = - k(v,d) 


t 


exists and is nonzero. 


A basic fact in what follows is the behavior of 


P[8(s)e G for 0 ¢ s& t] 
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where G is a smooth bounded open set containing the origin. If L(t,w) is the random 
measure representing the occupation time of the Brownian motion, i.e., if 
t 
L(tw(a) = 2 f yalels))ds , 
0 
then 


B(s) e G for O $ sg t <=> supp L(t, wo) CG. 
We can therefore estimate 

P[8(s) e G for O $ s& t] € PLsupp L(t,w) Gl. 
Since the set {u:supp u ο) is a compact set, we can estimate 


lim sup L 


to 


log PÍ8(s) e G forO « s & t] 


στι 


<a> anf τῶ) 


u:u(G)=1 
2 
$- int (pf Wel aa 
f-0 off G 
f20, Jf dx-1 
=- inf L [Σ [τε |2ακ] = A(G). 
g=0 off G 


The lower estimate fg?dx«i 


lim inf 1 


tro 


log Ρ[βί6) e G for 0 ¢ 8s ¢ t] 2 -A(G) 


οι 


is also easy to derive. See [1] for details. We can combine the two halves in the 
form of a lemma: 


Lemma 8.1. For nice open sets G 


lim t log Ρ[βί8) e G fon0£s£tl]--A(0). 


to 


Lower bound. Let G be a bounded open set containing the origin with smooth 
boundary. Let A(G) be the first eigenvalue of -54 in G with Dirichlet boundary 
conditions. Then according to Lemma 8.1 

P[B(s) c G for O $ s € t] = expl-A(G) + o(t)] . 
If B(s) e G for 0 ¢ s& t, then 
c^ 1/d 


|οξ | < |οἱ +0(1) astro, 


Therefore 


41 


-1/d 
lim inf D log Etexp-v|ct*  [1} 2 (v]e] + λίο)) . 


Tro 


Taking the infimum over all such sets G, we find 


-1/d 
lim inf L 1og E(exp(-v|ct" ^ |1) 2 -k(v,a) 


to 


where 
(8.3) k(v,d) = inf [v|G| λ(ο)1. 
G 


We have proved 
Theorem 8.2. 


-1/d 
| 


lim inf M log Etexp(-v|c£* 1} 2 -k(v,d) . 


to 


We now turn to the 

Derivation of the upper bound. Let us replace Rd by the d-dimensional torus τᾷ 
and consider Brownian motion on the torus. We will denote by E, expectation with 
respect to the Brownian motion on the torus of size 2. Since any set in Rd 


projected to τῷ has volume no larger than the original volume, 


-1/d 


=1/d 
E(exp(-v|c£* í PE Eg CexpL-v |c£* l3. 


We will show that 


ai/d 
lim lim sup t log Eg CexpL-v |c£* ID « -κίν.ὰ) . 


Loo t» 


Upper bounds on the torus. Let 4(x) be a function with support {x:|x| < e} 


which is nonnegative and has f$(x)dx = 1. Then, if $c(x) = té(xt 1/4), then 
11/d 
cg = Db. * φὲ > 0) 
where Πε is the occupation distribution and * denotes convolution. We will denote 
by fe, 
feo by * oy > 

the mollified local time. The problem we have reduces to two lemmas: 

Lemma 8.3. 

lim sup t log Eg{exp(-vt|x:f, > 0|1) < - inf [I(f) + v[x;f > 0|] 

tro f 


= inf [v|G] + ag(001 , 
G 


and 
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Lemma 8.5. 


lim inf [v|G| + Ag(G)] = inf(viG] + a(G)] . 
G G 


$29 


Of the two lemmas, the second is a standard approximation lemma involving 
truncation methods. We will not carry out the proof, but will only refer to [1]. 
We will sketch a proof of Lemma 8.3. If we consider the L4 topology for densities 
on Ts then ]x:f ? 0| is a lower semicontinuous functional of the density f. In 
view of Theorem 2.2, it is sufficient to prove the large deviation principle for f. 
in the L4 topology with a rate function I(f). 

Lemma 8.5. Let y be any mollifier, i.e., a smooth probability density. Then 


lim supi log Ρε, * y e C] £ -.— inf — I(f) 


tro f:f¥ pec 


for any C closed in L4. 

Proof: The map f + f * p is continuous from M with weak topology to Ly with 
norm topology. So the large deviation principle in the weak topology for Ly, which 
implies the large deviation principle in the weak topology for Ρε, is converted into 
a large deviation principle for f,*p in the norm topology of Ly. Theorem 2.3 
provides the precise proof. a 

We now state without proof Lemma 8.6. We will then state and prove Lemma 8.7, 
which will imply Lemma 8.3 and our main result. Finally we will prove Lemma 8.6. 


Lemma 8.6. 


i 1 
lim sup = log P| [fe * v - fell 2 01 € - κρίψ) 


to 


where ko CD » eas p> 69 for each p > 0. 

Proof: The proof will be given after the proof of Lemma 8.7. 

Lemma 8.7. The large deviation principle holds for f. with the rate function 
I(f) in the space L, with norm topology. 


Proof: Upper bound. 
Pf, e C) < Plt, * pe CP] Ρε - fe κ vl > pJ 
Therefore 
lim sup L1og P[f, e C] € - intl inf ΙΓ), κρίψ)} . 


t>o fif*yeCP 
Letting ψ + δρ and p > 0, we get 
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lim int I(f) = inf — I(f) and lim inf KP) = "i If), 
V*6o ϱ,ϱἍψδορῖρ γεορΤρ 9*0 γερρ E 


provided C is closed in L4. 


Lower bound. For an open set G around f, 
Pf, e G] 2 Plt, * we Gi] - PE||fe - fe*vl] 2 pl 


where G4 is a smaller open set around f such that the sphere around G, of radius p 
is contained in G. The result is again obvious from Lemma 8.6. o 

From Lemma 8.7 we obtain Lemma 8.3 by an application of Thereom 2.2. If we now 
combine it with the lower bound, i.e. Theorem 8.2, and take Lemma 8.4 for granted, 
then we have 


Theorem 8.8. 


-1/d 
lim + 106 Elexp[-v|ct* |J} = κια) 


t 
where k(v,d) is given by (8.3). 


We now turn to the 


Proof of Lemma 8.6. 


Hf * v - fell ο 


lg| 1 

= sup [{ (Ly ὁς * y - Le * e G0gG0dx| 
[1 

= sup [fin (x dbp (ax) J 
le|st 


where 
hc(x) = (g * &k * y - g * φς) (x) 
(We have assumed that $c and y are symmetric.) 
The map g + defined by 6 = g*é, is a compact map of the unit ball jg] ς 1. 
Therefore for any p > 0 we can find a finite number N = N(t,p) of functions 
81, 00%, Oy such that the image of the unit ball is covered by spheres around 


84, ***, 0N of radius p/2. We can assume that 94,***,8y are all bounded by 1 as well. 


Then 


Ifc ν- fill & $ + sup Cf (61 Go - (0; * w(x) Ly (dx), 
1$ign 
Ρε * v - fel] 2 0] € N sup PC [xs (x) by (ax) 2 5) 
1$i<n 
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t 
Ś N sup pef xi(B(s))ds 2 B t] 
ISi$N ο 
t 
$ Νε 79 U/?g, (exp(z | x1 (8) (s)2ds1] 
0 
where yi(x) = 0,0 - (e) (x). 


One can show that for any y with |χ| $2 
t 


Eg (exp[z | x(B(s))ds]} £ Cz expltr,(zy)] 
0 
where λρίΖχ) is the largest eigenvalue of 


1 
h*zy onTQ. 


2 
If, for each p > 0, N(t,p) < expl Det] for some Dy, then 
1 
τ log PE||fe * y- felj Σο] ς Dp- = + sup λίαχ) . 


χιχ5θ-θΧψ, le] «1 
One verifies that sup λίΖχ) + 0 as ψ > δρ for each z > 0. Therefore 


lim sup lim sup t log PE| {fy * y- fell ἀρ] < Dy - T ; 
V*$o Όλο 
and by letting Z we will obtain our lemma. We no need only the estimation of 


N(p,t) to complete the proof of Lemma 8.6. 
B(x) =t facyroccx - y)t!/4ay , 
[eG ~ e| s [Teo - xoc + y) - olay, 


sup eG) - θίχρ)] € u(t 9) 


[x17x2| sh 
where w is the Ly modulus of continuity of $. Therefore 


lex) = θίχρ)| £n 1f nt "d$ $n, 1.6., ip ng nct. 


We can divide the torus τῷ into small cubes of size ΠΑ and we will then have 
t/(n')9 cubes. In order to cover the unit ball, we need step functions that are 


eonstant on cubes, and an easy estimate provides the bound 
C (8 
n g |}. 
ρ 
This almost completes the proof of Lemma 8.6. 
Finally we need to show that 
inf Γλ(α) + v|G]] = k(v,d) 5 0 . 
G 


If we expand a region by a factor o, then A{oG) = (1/o°)A(G) and [σα = ajaj]. Then 
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inf pO. + vodjo]] = ο(ν,α)|ο|2/ (433) t4 (6) 14/ (492) 
σ»0 c? 


where c(v,d) can be calculated explicitly. Therefore 


k(v,d) = e(v,d) inf [λ(α)1β/ (418) 
[α|-1 
A rearrangement argument tells us that the infimum is attained when G is the sphere 


of unit volume in R¢. This calculates k(v,d) explicitly, and k(v,d) > 0. For 


details see [1]. o 


Section 9. The Polaron Problem 
A problem that comes up in statistical mechanics, known as the polaron problem, 


leads to the following question concerning Brownian motion., Does 


(9.1) im! ΐ [ | ais] d (a) 
s : t log E (exp[a ΠΟΊΩΙ dg ds]} = gla 
o 0 0 


exist, where βί») is the three-dimensional tied-down Brownian motion in the interval 


[0,t]? And how does g(a) behave for large a? A conjecture by Pekar states that 


2 2 
lim aS - sup i taf] —— dx dy - J [νο[24. : 
qo $€ R 
[ΠΡ 


We will use our methods to prove the conjecture. 


First we note that 


t t t t 
e" |9-σ] e^ (870) 
| | ds do = 2 | dg | ds 
[xCo)=xts)] [x (G»»x(o) 
0 0 0 
t 
e^ ($70) 
= 2 | do | ds + o(t) 
[x (s)=x(o) 
0 σ 
t 
a | F(8,w)ds + o(t) 
0 


where θα is the shift and 
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By our large deviation results, one expects g(a) to exist in (9.1) and to be given 


by the variational formula 


g(a) = sup [2cE%F(w) - H(Q)] 
Q 


where H(Q) is the entropy relative to Brownian motion of the stationary process Q 
and Q varies over all stationary processes with values in R3. There are two 
technical problems here. The first is the fact that we have tied-down Brownian 
motion and not free Brownian motion. For t large there is very little difference, 
and this can be made precise. The details are in [3]. A more serious problem is 
the fact that Brownian motion does not satisfy the conditions for obtaining upper 
bounds. The lower bound, hoever, follows painlessly by our methods. To get upper 
bounds, one notices that if we replace Brownian motion by the OrnsteinsUhienbeock 
process with generator 


5 A - ex: V 

for some small ε > 0, the theory applies, and moreover the expectation for Brownian 
motion is dominated by the expectation of the OU process for every e > 0. If H,(Q) 
is the entropy relative to the OU process and H(Q) is the entropy relative to 


Brownian motion, then for any Q 


He(Q) = H(Q) + e [[[χ(ο)||244 - 32 χ µία) > 3. 


Therefore 
t 
lim sup i log E {exp | | atlasi il 
CEA as PA d la POON 
€ sup [2αΕϑΕίω) - He (Q)] € sup [20E (w) - H(Q)] + 3E A 
Q Q 


By letting « + O we obtain that the limit (15.1) exists and g is given by 


(9.2) βία) = sup [2αξΏξ(ίω) - H(Q)] . 
Q 


Using (9.2) and Brownian scaling, one can get 
o 


2 
-t/ 
(9.2: JEU ls pA gu[ 9 


dt - H(Q)}] , 
a Q a 0 x(t )>x(0)] 


and now we have to see what happens to 


- i 2 
κῶς 2. | e-t/a "m 
ο ς 
a? 0 Txt =x) | 89/8 
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Writing q(t,dx,dy) for the two-dimensional distribution of x(0) and x(t) under the 


stationary process Q, we have 
- 
2 
| e t/a dt | | q(t,dx,dy) = lim | | q(t,dx,dy) 
2 x-y tee x-y 


& | | iE 


lim 
αλα q 


This 18 not quite correct. However, if Q is ergodic, the independence of x(0), x(t) 
in an average sense, for t + », is enough to give the final answer. This argument 
is essentially correct. 

There is also ἃ serious problem of interchanging sup and limit ona. If we 


could carry this out we would have 


(9.4) rim δία) sup caf [ α(4Χ)α(άγΥ} (94 
uae eant 
- a(dx)a(dy) _ 
p ta DEM I(q)] 


by the contraction principle. Since 
2 
S vf | 
I(q) 5 τ- dx 
if q(dx) = f(x)dx and I(q) = e otherwise, the variational formula in (9.4) reduces 
to Pekar's conjecture. Incidentally, the unbounded nature of the function 1/|κ| 


causes additional technical problems that need to be handled. All this has been 


rigorously justified, and the details can be found in [3]. 


Section 10. Large deviations and laws of the iterated logarithm 


Let £(t,-) be the local time of the one dimension Brownian motion defined by 
t 


kay) = f 6(B(s) - yas 
0 
One knows that £(t,y) is jointly continuous in t and y. If we define 


Ry) σε a(t vey) 
and 


ἐμ 1 €———— 
git 5 
(t,y) UI Ta t &(t,/t/log log t y) 


then the distribution of a(tsy) is independent of t by Brownian scaling. One can 
get functional laws of the iterated logarithm 2(t,-) by showing that the set of 


limit points of iC.) as t + e are precisely the set of subprobability densities 
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1 (pty) Y 
p(y) with jp(y)dy <1 and t { στ E 


functionals F which are nice and obtain 


dy ς 1. In particular we can take 


limsup F(É(t,*)) = sup F(p(-)) 
too 6 


where C is the set of limit points described earlier. 


If we take F(p(*)) = p(0) we obtain 


limsup, 25) _ - vo ae 
tro t log log t 
If we take F(p()) » inf t: p(y)dy = 1] then we obtain (taking liminf rather than 
3 -2 
limsup) 
liminf /198198 t sup |g(s)] = mc 


to O«sct 
This is the so-called "other law of iterated logarithm". There are several other 


examples that one can think of. For some of the details see [2]. 


Concluding Remarks 

The large deviation theory that we have developed during these lectures depends 
on rather stringent assumptions on the transition probabilities p(t,x,dy). These 
assumptions are strong enough to ensure the existence of at most one invariant 
probability measure for the Markov Process. If we were to drop this strong 
ergodicity assumption then the large deviation rate, even when they exist could 
start to depend on the starting point. To be more precise, if the Markov Process 
were to admit several invariant probability measures, the extremals among them being 
ergodic, then the large deviation rate for the types of sets that we had considered 


before, namely 
Ρχ[(Ένων») € A] 
could have the following type of behavior: 
Py[L(t,w,*) ε A] = exp[-t I,(A) + o(t)] 


for almsot all X wer-t a. Here a is an ergodic invariant probability measure and 
I,(+) is computed in terms of a rate function depending on a. We can also start 


with initial distribution a and then 


PylL(t,w,*) ε A] = expl=t Τα(Α) + o(t)] 
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where Ty) is computed in terms of a slightly different rate function also 


depending on a. I, takes care of large deviations in the evolution where I, takes 
care of large deviations in the initial conditions as well. There are some 
interesting examples of infinite particle systems where computations have been made 


to illustrate such behavior. The relevant references are [6], [7] and [8]. 


Bibliographical Remarks: 

The results outlined here appeared originally in several articles. There are 
now several sources available that provide a general survey of the large deviation 
theory along with a list of references: Varadhan [10], [11], Stroock [9] and Ellis 
[5] are good sources. The missing details in some of the applications of Sections 


6, 8, 9, 10 can be found in [1], [2] and [3] and [4]. 
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